No-reference panoramic image quality assessment based on multi-region adjacent pixels correlation

The distortion measurement plays an important role in panoramic image processing. Most measurement algorithms judge the panoramic image quality by means of weighting the quality of the local areas. However, such a calculation fails to globally reflect the quality of the panoramic image. Therefore, the multi-region adjacent pixels correlation (MRAPC) is proposed as the efficient feature for no-reference panoramic images quality assessment in this paper. Specifically, from the perspective of the statistical characteristics, the differences of the adjacent pixels in panoramic image are proved to be highly related to the degree of distortion and independent of image content. Besides, the difference map has limited pixel value range, which can improve the efficiency of quality assessment. Based on these advantages, the proposed MRAPC feature collaborates with the support vector regression to globally predict the quality of panoramic images. Extensive experimental results show that the proposed no-reference panoramic image quality assessment algorithm achieves higher evaluation performance than the existing algorithms.


Introduction
Compared with normal planar images, panoramic images adopt the image stitching technique and can achieve free viewing in various directions, which provide users an immersive experience with the help of head-mounted displays (HMDs). In practical, the operations, including stitching, blending, projecting and encoding, inevitably lead to quality degradation of panoramic images, and further cause discomfort and physical illness [1]. Therefore, it is urgent for panoramic images to propose an accurate quality assessment algorithm in the future immersive applications.
The existing quality assessment algorithms consist of full-reference image quality assessment methods and no-reference image quality assessment methods [2]. The traditional fullreference image quality assessment methods aim to solve the problem of non-uniform sampling by peak signal to noise ratio (PSNR) or structural similarity (SSIM) [3]. Yu et al. [4] proposed a method to calculate PSNR on a spherical surface, called S-PSNR (spherical PSNR). Sun et al. [5] proposed the weighted-to-spherically-uniform PSNR (WS-PSNR) that assigns different weighting maps for different reprojection forms. Zakharchenko et al. [6] presented craster parabolic projection PSNR (CPP-PSNR) with the least shape distortion. In addition to PSNR, Zhou et al. [7] proposed weighted-to-spherically-uniform SSIM (WS-SSIM) that is similar to WS-PSNR. Mittal et al. [8] proposed a blind/referenceless image spatial quality evaluator (BRISQUE) that is a widely used no-reference image quality assessment algorithm. Most of these assessment methods are developed on the basis of PSNR and SSIM. However, these metrics can neither calculate the structural component changes, nor tell the different types of distortions [9]. Dosselmann and Yang [10] demonstrated that a huge gap exists between the PSNR or SSIM score and human perception. The results on subjective assessment databases also show that these algorithms have a certain improvement in performance compared with the classical metrics, but the overall performance is still unsatisfactory.
With regard to the no-reference image quality assessment method, Kim et al. [11,12] proposed a model based on generative adversarial networks, which was divided into prediction and guidance. The image quality score is predicted according to the image content information and location information, and the guidance was used to judge whether the scoring is manual score or the output score of prediction network. These two parts are adversarial, which can iteratively improve the performance of prediction network. In fact, the participators watch the panoramic images from the perspective of viewports, instead of small blocks. Li et al. [13] proposed a view based convolutional neural network (V-CNN) method. This method first used CNN and non-maximum suppression to calculate the importance weight of the view, then projected the view image to the plane. Subsequently, the method calculated the quality scores of saliency map and view, and obtained the final quality score by weighting. Although the method proposed by Yang et al. [14] fuses multi-level saliency map features based on weighted PSNR and used neural network, the network can only learn the mapping relationship between weighted PSNR and mean opinion score (MOS) in the training process, which is unstable and difficult for the network to learn. Deep learning can extract many high-dimensional features for quality evaluation, which makes its performance generally better than traditional methods. However, deep learning-based methods depend on the quality score of the image patches, and cannot globally perceive the quality of the panoramic image.
The main problem of existing methods is that they do not consider the global characteristics of panoramic images, leading to the degradation of quality assessment precision. With regard to the no-reference panoramic image quality assessment, the performance is largely dependent on the feature extraction. In other words, the extracted features, which can globally reflect the distortion degrees, are suitable for panoramic image quality assessment. By studying the statistical characteristics of panoramic images, we summarize that the features should have three characteristics: 1) The features should have low correlation to the content of the image. In noreference task, if the correlation between feature and image content is low, the distorted image content does not affect the feature used for assessment. Therefore, the performance of the assessment with such a low correlation model will be robust; 2) The features should be closely related to the levels of distortion. In this way, the features can precisely reflect the distortion without requirement of reference; 3) The feature size should be as small as possible. The size of the feature has a great influence on the no-reference assessment performance. Actually, when the feature dimension is much larger than the number of samples, the no-reference performance is pretty poor.
Based on the observation above, we judge the degree of panoramic image distortion through the correlation of adjacent pixels, and propose a no-reference quality assessment method based on the multi-region adjacent pixels correlation (MRAPC) features. Our proposed method can calculate the image quality score globally and achieve better performance. The contributions of this paper are as follows: 1. This paper analyzes the statistical characteristics of panoramic images, and further proves that the panoramic images distortion is highly related to the correlation of adjacent pixels. Therefore, the feature that can globally express the correlation between adjacent pixels is very suitable for quality assessment of coding distortion; 2. This paper proposes a no-reference panoramic image quality assessment method based on the correlation of adjacent pixels in multi-region. In this way, the global characteristics are considered to promote the objective quality assessment; 3. This paper reduces the feature dimension of panoramic image through shrinking the pixel value range, leading to the significantly computational efficiency improvement.
The rest of this paper is organized as follows. The overview of efficient feature extraction for panoramic images is presented in Section 2. Section 3 describes the proposed panoramic image quality assessment algorithm. Section 4 shows and analyzes the experimental results. Finally, the conclusions are drawn in Section 5.

The efficient feature for panoramic images
To obtain the expected features mentioned above, we conduct extensive experiments on panoramic images to study the statistical characteristics of their pixels based on the observation earlier. Fig 1 shows the probability distribution of adjacent pixel pairs of 91 panoramic images in CVIQD [15], OIQA [16] and VQA-ODV [17] datasets, where I i,j represents the pixel value of the panoramic image at coordinate (i, j). It can be seen from Fig 1 that the values of adjacent pixels in panoramic images are very close, which reveals that the energy in the panoramic image is mainly concentrated in the low frequencies.
To this end, this paper analyzes the proportion of high-frequency information in the panoramic image by using various difference operators. It can be regarded as a high-pass filter, which can suppress the low-frequency information and display the high-frequency information in panoramic image. Taking the horizontal difference operator [−1, 1] (see Table 1) as an example, it is the difference of the adjacent pixel in panoramic image, i.e., I i,j − I i+1,j . We utilize the Pearson coefficient to calculate the correlations of adjacent pixels along horizontal direction, vertical direction and diagonal direction, respectively. It is found from It can be seen from the above experiments that the statistical characteristics of difference results are not only independent of panoramic image content, but also related to the degree of distortion. Consequently, the statistical characteristic is the expected indicator for measuring the distortion degree of the panoramic image.

Proposed method
In this section, we propose a no-reference quality assessment method based on multi-region adjacent pixel correlation (MRAPC) feature. The proposed method consists of four parts: residual calculation, residual truncation, co-occurrence matrix calculation and MRAPC-based Support Vector Regression (SVR) model. The framework is shown in Fig 5. Specifically, the MRAPC features are calculated by co-occurrence matrix, and then are fed into SVR for training and testing, and finally the objective quality assessment scores of panoramic images are obtained.

Calculation of difference map and threshold
To make the statistical characteristics more compact, the difference map is adopted in this paper to limit the image content into a small range, leading to the dimension reduction of the features. Specifically, this paper uses difference map to represent the pixel gradient. For an Table 1. Six difference operators. They are divided into horizontal and vertical operators, and each directional set consists of first-order, second-order and third-order operator.

Direction
First-order operator Second-order operator Third-order operator image with size of m × n, its k-order difference map is expressed as In the practical calculation process, six kinds of difference operators (see Table 1) are used to obtain six kinds of residual maps. Various difference operators indicate the corresponding correlation of adjacent pixels. Taking the horizontal direction as an example, the first-order residual is calculated by R 1 i;j ¼ I i;jþ1 À I i;j , the second-order residual by R 2 i;j ¼ I i;jþ1 À 2I i;j þ I i;jÀ 1 , and the third-order residual by R 3 i;j ¼ I i;jÀ 1 À 3I i;j þ 3I i;jþ1 À I i;jþ2 . The calculation along the vertical direction is similar as that along horizontal direction.
It should be noted that since most of the residual values are concentrated in a small range, we set a threshold T = 2 for difference map R to reduce the residual range:

Calculation of co-occurrence matrix
In this section, we use the fourth-order co-occurrence matrix to describe the pixel gradient in residual map. As mentioned earlier, the correlation of pixels along diagonal direction is weaker than that along horizontal and vertical directions, so for simplicity, the pixel gradient is only calculated along horizontal and vertical directions. Taking the horizontal direction as an example, a four-dimensional hypercube C h d , which is generated by a fourth-order co-occurrence matrix, can represent the joint probability distribution of four adjacent pixels along the horizontal direction: where d = (d 1 , d 2 , d 3 , d 4 ) 2 {−2, −1, 0, 1, 2} 4 and Z is the normalization coefficient, which makes P d2t 4 C d ¼ 1 Take d � = (0, 0, 0, 0) as an example, the value of C h d � is the joint probabilities that residual point R i,j and its horizontal adjacent points R i+1,j , R i+2,j , R i+3,j are all equal to 0. The calculation of C v d along vertical direction is similar as that along horizontal direction, i.e., the joint probability distribution of vertical adjacent points R i,j , R i+1,j , R i+2,j , R i+3,j . Note that each difference map with the same order corresponds to two directional co-occurrence matrices.

Symmetric integration of co-occurrence matrix
The four-dimensional hypercube C d has |τ 4 | = 625 elements in total, leading to a lot of redundancy. For efficiency, we propose to symmetrically integrate the elements in each co-occurrence matrix according to the following two rules: where does not affect the pixel gradient either. After integrating, the number of elements in the cooccurrence matrix is reduced from 625 to 169. The MRAPC feature is computed by six difference operators and thus has 6 × 169 = 1014 dimensions, which is lightweight. Due to the preservation of pixel gradient, the symmetric integration greatly reduces the dimension of features while keeping the effectiveness of features.

SVR model training and testing
SVR is a widely used form of support vector machine [18] to solve regression problems, so we adopt SVR to get the final quality score of panoramic images. As shown in Fig 6, for a given sample D = {(x 1 , y 1 ), (x 2 , y 2 ), � � �, (x n , y n )}, y i 2 R, we construct a regression model in the form of f(x) = w T � x + b, to make f(x) and y as close as possible, where w and b are model parameters. Since it is a regression problem that needs to be calculated, the model can tolerate a deviation of at most ε between f(x) and y. We divide the database into training set and test set. Then, we feed the MRAPC features (assigned as x) of panoramic images of training set and corresponding MOS (assigned as y) into SVR for training to get the trained SVR model. Finally, the features of the test images are considered to predict the quality scores.

Experimental settings
For experiments, the assessment algorithm with the proposed MRAPC feature is tested on the CVIQD [15] and OIQA [16] databases, due to their rich distortions for the panoramic images. The CVIQD contains 16 reference panoramic images and each image corresponds to 33 distorted images, which have three distortion types, namely JPEG coding distortion, AVC coding distortion and HEVC coding distortion. The quality factor of JPEG ranges from 0 to 50, and the interval is 5; The quantization parameters of the AVC and HEVC range from 30 to 50, with an interval of 2. The OIQA consists of 16 reference panoramic images and each image corresponds to 20 distorted images, which have four distortion types, namely JPEG coding distortion, JPEG2000 coding distortion, Gaussian blur and Gaussian white noise. Note that the proposed algorithm follows the recommendations given in [19] and uses a 5-parameter logistic function to fit the predicted results, as shown in Eq (5).
where x denotes the score predicted by the proposed algorithm, {β i |i = 1, 2, � � �, 5} are the five parameters to be fitted, and f(x) denotes the final quality score. We use three commonly used evaluation metrics: Spearman Rank-order Correlation Coefficient (SROCC), Pearson Linear Correlation Coefficient (PLCC) and Root Mean Squared Error (RMSE) to evaluate the proposed algorithm. 1) Spearman Rank-order Correlation Coefficient is calculated by: where N is the number of images and d i is the difference between the subjective score and the objective score of the i-th image. 2) Pearson Linear Correlation Coefficient is computed by: Þ ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffiffi ffi ffi ffi where N is the number of images; s i and o i denote the subjective and objective assessment score of the i-th image; μ s and μ o represent the corresponding mean values of s i and o i , respectively. 3) Root Mean Squared Error is calculated by:

RMSE ¼
ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi where N is the number of images; s i and o i denote the subjective and objective assessment score of the i-th image. These evaluation metrics can reflect assessment algorithms performance from different aspects. Specifically, SROCC mainly focuses on the monotonicity of prediction, as well as PLCC and RMSE on the accuracy of prediction. Note that stronger correlation and lower error indicate higher performance.
Both CVIQD and OIQA databases contain 16 scenes. In order to ensure the reliability of the results, the images of 12 scenes are randomly selected for training. The images of the remaining 4 scenes are selected for testing. After repeating 1000 cross validations, the median of obtained SROCC scores and the other metrics in all experiments are taken as the final experimental results. In the experiment, the SVM function "fitrsvm" of MATLAB 2018a is selected for training, where the radial basis function (RBF) is selected as the kernel function and other parameters keep the default configurations.

Overall performance analysis
In order to verify the advantages of the proposed method, this section compares the proposed algorithm with the existing static image assessment algorithms SSIM [3], BRISQUE [8], and the panoramic image quality assessment algorithms S-PSNR [4], WS-PSNR [5], CPP-PSNR [6], WS-SSIM [7]. The experimental results are shown in Table 2. From the results, we can conclude the following three points. 1) The PSNR-based algorithms are conducted at the pixel level, so it fails to consider the correlation between pixels. As illustrated in Table 2, the highest SROCC for these algorithms is 0.8760 of WS-PSNR; 2) The SSIM-based algorithms consider the image structure, but the nonlinear structural distortion caused by projection has a great influence on the calculation of SSIM. WS-SSIM, as the best SSIM-based algorithms, is still insufficiently effective; 3) The proposed algorithm can globally judge the degree of distortion according to the correlation of adjacent pixels, so the highest performance is achieved. In a word, the algorithm with the proposed MRAPC is superior to the other panoramic image quality assessment algorithms.
In addition, due to the richer scenes in the CVIQD, we select it to intuitively illustrate algorithm performance. Fig 7 shows the scatter diagrams and fitting curves of different algorithms on CVIQD. Since scatter diagrams and fitting curves reflect subjective and objective quality scores, respectively, it means the algorithm performance higher that more scatter points are located on the fitting curve and more uniform of the points exist beside the curve. As can be seen from Fig 7, the quality scores predicted by the proposed algorithms have a stronger correlation with MOS scores.  https://doi.org/10.1371/journal.pone.0266021.g007

Performance analysis of different distortion types
To further demonstrate the accuracy of the proposed algorithm in terms of coding distortions, we additionally conduct extensive experiments on the three kinds of compression distortions of CVIQD. In Table 3, the result show that our proposed algorithm is comprehensively better than the other algorithms in terms of the distortions by JPEG, AVC, and HEVC. With regard to the AVC distortion, the PLCC score of our algorithm is slightly lower than that of the BRIS-QUE algorithm, but the difference 0.0084 is negligible. It proves that our proposed algorithm has potential to guide the improvement on panoramic image compression.

Ablation experiment
The ablation experiment is carried out to verify the effectiveness of the three residual calculation methods. Table 4 illustrates the results of the "first-order" MRAPC features, the "firstorder + second-order" MRAPC features, and the "first-order + second-order + third-order" MRAPC features, respectively. It can be seen from Table 4 that with the introduction of higher order difference operator, the performance is improved. It verifies the effectiveness of the proposed multi-region adjacent pixels correlation feature.

Model verification experiment
In order to verify the generalization ability of our algorithm, we use the JPEG distorted images in OIQA or those in CVIQD, respectively, as the training set or the test set. Table 5 shows the results in these two experimental cases. In both cases, the proposed algorithm achieves relatively high correlation and accuracy, which proves the generalization ability of our proposed algorithm is outstanding.

Conclusion
This paper aims at proposing an efficient no-reference objective quality assessment for panoramic images. We comprehensively analyze the property of panoramic images, and find that the adjacent pixels correlation of panoramic image is not only highly closed to the statistical characteristics of the image, but also prone to be affected by distortions. Based on this observation, we propose a no-reference quality assessment method of panoramic images based on the multi-region adjacent pixels correlation in this paper. Firstly, the different map of panoramic image is calculated by using the first-order, the second-order and the third-order difference operators, respectively. Then, the difference map is truncated, and the joint probability distribution of four adjacent pixels on the residual is calculated by using the fourth-order co-occurrence matrix. The probability distribution is input into SVR model as a feature, and the final quality assessment score is obtained. The experimental results show that our algorithm has higher performance than state-of-the-art algorithms.